Discounted Cumulated Gain Based Evaluation of Multiple-Query IR Sessions
نویسندگان
چکیده
IR research has a strong tradition of laboratory evaluation of systems. Such research is based on test collections, pre-defined test topics, and standard evaluation metrics. While recent research has emphasized the user viewpoint by proposing user-based metrics and non-binary relevance assessments, the methods are insufficient for truly user-based evaluation. The common assumption of a single query per topic and session poorly represents real life. On the other hand, one well-known metric for multiple queries per session, instance recall, does not capture early (within session) retrieval of (highly) relevant documents. We propose an extension to the Discounted Cumulated Gain (DCG) metric, the Session-based DCG (sDCG) metric for evaluation scenarios involving multiple query sessions, graded relevance assessments, and open-ended user effort including decisions to stop searching. The sDCG metric discounts relevant results from later queries within a session. We exemplify the sDCG metric with data from an interactive experiment, discuss how the metric might be applied, and present research questions for which the metric is helpful.
منابع مشابه
Binary and graded relevance in IR evaluations--Comparison of the effects on ranking of IR systems
In this study the rankings of IR systems based on binary and graded relevance in TREC 7 and 8 data are compared. Relevance of a sample TREC results is reassessed using a relevance scale with four levels: non-relevant, marginally relevant, fairly relevant, highly relevant. Twenty-one topics and 90 systems from TREC 7 and 20 topics and 121 systems from TREC 8 form the data. Binary precision, and ...
متن کاملInteractive Analysis and Exploration of Experimental Evaluation Results
This paper proposes a methodology based on discounted cumulated gain measures and visual analytics techniques in order to improve the analysis and understanding of IR experimental evaluation results. The proposed methodology is geared to favour a natural and effective interaction of the researchers and developers with the experimental data and it is demonstrated by developing an innovative appl...
متن کاملCumulated Gain-based Indicators of Ir Performance
Modern large retrieval environments tend to overwhelm their users by their large output. Since all documents are not of equal relevance to their users, highly relevant documents should be identified and ranked first for presentation to the users. In order to develop IR techniques to this direction, it is necessary to develop evaluation approaches and methods that credit IR methods for their abi...
متن کاملSimulations as a Means to Address Some Limitations of Laboratory-based IR Evaluation
We suggest using simulations to address some of the limitations of test collectionbased IR evaluation. In the present paper we explore the effectiveness of short query sessions based on a graph-based view of the searching situation where potential queries (query key combinations) constitute the vertexes of a graph G describing each topic. “Session strategies” are rules which determine the accep...
متن کاملCumulated Relative Position: A Metric for Ranking Evaluation
Measuring is a key to scientific progress. This is particularly true for research concerning complex systems. Multilingual and multimedia information access systems, such as search engines, are increasingly complex: they need to satisfy diverse user needs and support challenging tasks. Their development calls for proper evaluation methodologies to ensure that they meet the expected user require...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008